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Q ■ Abstract 

o : 

, Even though people in our contemporary, technological society are depending on communica- 

\ tion, our understanding of the underlying laws of human communicational behavior continues to be 
*c/3 ■ 

t>-»' poorly understood. Here we investigate the communication patterns in two social Internet commu- 
^ ! 

nities in search of statistical laws in human interaction activity. This research reveals that human 

\ communication networks dynamically follow scaling laws that may also explain the observed trends 

\^ ' in economic growth. Specifically, we identify a generalized version of Gibrat's law of social activity 

o : 

■ expressed as a scaling law between the fluctuations in the number of messages sent by members and 

(N ; 

Q>^ . their level of activity. Gibrat's law has been essential in understanding economic growth patterns, 

o ; 

yet without an underlying general principle for its origin. We attribute this scaling law to long-term 

o : 

^ • correlation patterns in human activity, which surprisingly span from days to the entire period of the 

^ . available data of more than one year. Further, we provide a mathematical framework that relates 

H : 

, the generalized version of Gibrat's law to the long-term correlated dynamics, which suggests that 

the same underlying mechanism could be the source of Gibrat's law in economics, ranging from 
large firms, research and development expenditures, gross domestic product of countries, to city 
population growth. These findings are also of importance for designing communication networks 
and for the understanding of the dynamics of social systems in which communication plays a role, 
such as economic markets and political systems. 



1 



I. INTRODUCTION 



The question of whether unforeseen outcomes of social activity follow emergent statistical 



laws has been an acknow! 
;he 19th century 



edged problem in the social sciences since at least the last decade of 
^. Earlier discoveries include Pareto's law for income distributions 
Zipf's law initially applied to word frequency in texts and later extended to firms, cities 
and others Q], and Gibrat's law of proportionate growth in economics [3, ^,[9]. 

Social networks are permanently evolving and Internet communities are growing each day 
more. Having access to the communication patterns of Internet users opens the possibility 
to unveil the origins of statistical laws that lead us to the better understanding of human 
behavior as a whole. In this paper, we analyze the dynamics of sending messages in two 
Internet communities in search of statistical laws of human communication activity. The 
first online community (OCl) is mainly used by the group of men who have sex with men 
(MSM) [38]. The data consists of over 80, 000 members and more than 12.5 million messages 
sent during 63 days. The target group of the second online community (0C2) is teenagers 
[lo| . The data covers 492 days of activity with more than 500,000 messages sent among 
almost 30,000 members. Both web-sites are also used for social interaction in general. All 
data are completely anonymous, lack any message content and consist only of the time when 
the messages are sent and identification numbers of the senders and receivers. 

The act of writing and sending messages is an example of an intentional social action. 
n contrast to routinized behavior, the actants are aware of the purpose of their actions 
21, |3|]. Nevertheless, the emergent properties of the collective behavior of the actants are 
unintended. In Fig. [T^ we show a typical example of the activity of a member of OCl 
depicting the times when the member sends messages. Figure [T]d provides the cumulative 
number of messages sent (green curve) compared with a random surrogate data set (brown 
curve) obtained by shuffling the data, as discussed below. As would be expected, there are 



lar ge fl uctuations in the members' activity when compared with a random signal H 



13 



12 



151]. The messages sent at random display small temporal fluctuations while the OCl 
member sends many more messages in the beginning and much less at the end of the period 
of data acquisition (as also seen in Fig. [It, displaying the number of messages sent per day). 



[38] The study of the de-identified MSM dating site network data was approved by the Regional Ethical Review 
board in Stockholm, record 2005/5:3. 
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While such extreme events or bursts have been documented for many systems, including 



e-mail and 



etter post communication, instant messaging, web browsing and movie watching 



[ill . [12I . I3I . 1^ . 15|, their origin is still an open question. 



II. RESULTS 

Growth in the number of messages 

The cumulative number, mj{t), expresses how many messages have been sent by a certain 
member j up to a given time t [for a better readability we will not write the index j explicitly, 
m{t), see details on the notation in the Supporting Information (SI) Sec. I]. The dynamics 
of m(t) between times to and ti within the period of data acquisition T (tg < ti < T) can be 
considered as a growth process, where each member exhibits a specific growth rate rj (r for 
short notation): 

r = In , (1) 

mo 

where mo = m(to) and mi = m(ti) are the number of messages sent until to and ti, 
respectively, by every member. To characterize the dynamics of the activity, we consider 
two measures, (i) The conditional average growth rate, (r(mo)), quantifies the average 
growth of the number of messages sent by the members between to and ti depending on the 
initial number of messages, mo. In other words, we consider the average growth rate of only 
those members that have sent mo messages until to (see Methods, Sec. |IV]for more details). 
(a) The conditional standard deviation of the growth rate for those members that have 
sent mo messages until to, cr(mo) = \/ {{r{mo) — (r(mo)))^), expresses the statistical spread 
or fluctuation of growth among the members depending on mo. Both quantities are relevant 
in the context of Gibrat's law in economics 

Baa 

which proposes a proportionate growth 
process entailing the assumption that the average and the standard deviation of the growth 
rate of a given economic indicator are constant and independent of the specific indicator 
value. That is, both (r(mo)) and o"(mo) are independent of mo jo] 

In Fig. [2tL,b we show the results of (r(mo)) and O"(mo) versus mo for both online commu- 
nities. We find that the conditional average growth rate is fairly independent of mo. On the 
other hand, the standard deviation decreases as a power-law of the form: 

a{mo) ~ tUq^ . (2) 



We obtain by least square fitting the exponents /3oci = 0.22 ± 0.01 for OCl and /3oc2 = 
0.17 ±0.03 for 0C2 (tfie values deviate slightly for large mo due to low statistics). Although 
the web-sites are used by different member populations, the power-law and the obtained 
exponents are quite similar. The exponents are also close to those reported for growth in 
economic systems such as firms and countries (0.15 — 0.18, [l^), research and development 
expenditures at universities (0.25, |l7|), scientific output (0.28 — 0.4, [l8|), and city popula- 
tion growth (0.19 — 0.27, [3]). The approximate agreement between the exponents obtained 
for very different systems (social or of human origin) can be considered as a generalization 
of Gibrat's law, suggesting that the mechanisms behind the growth properties in different 
systems may originate in the human activity represented by Eq. ([2]). 

Figures ^ and d depict the results when we randomize the data of OCl and 0C2, re- 
spectively (see Sec. IIVI for details of the randomization procedure), such that any temporal 
correlations are removed. The typical dynamics for such surrogate data set are shown in 
Fig. [lb (the brown curve) displaying a clear random pattern of small fluctuations in com- 
parison with the original data of larger fluctuations (green curve). We find that the random 
signal displays a close to constant average growth rate (r(mo)) and that the fluctuations be- 
have as in Eq. ([2]) but with an exponent Prnd = 1/2 (Fig. [2t,d). The origin of this value has 
a simple explanation: If an isolated individual randomly flips an ideal coin with no memory 
of the previous attempt, then the fluctuations from the expected value of the fraction of 
obtained heads decay as a square-root of the number of throws, implying /^md = 1/2. In 
contrast to randomness, here we hypothesize that the origin of the generalized version of 
Gibrat's law with (3 < 1/2 in Eq. ([2]) is a non-trivial long-term correlation in communi- 
cation activity. These correlations possibly arise from internal and external stimuli from 
other members transmitted through the highly connected network of individuals, an effect 
that is absent in the randomized data. The exponent value of /3 ~ 0.2 for OCl and 0C2 
implies that the fluctuations of very active members are smaller than the ones of less active 
members, but they are significantly larger compared to the random case (compare Fig. [2^,b 
with Fig.[2b,d). 
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Long-term correlations 



The exceptional quality of the data (more than 10 million messages spanning several 
effective decades of magnitude in terms of both activity and time) allows to test the above 
hypothesis by investigating the presence of temporal correlations in the individuals' activity. 
We aggregate the data to records of messages per day (an example is shown in Fig. [It) to 
avoid the daily cycle in the activity and analyze the number of messages sent by individuals 
per day, /i(t), where t denotes the day [■m{t) = l^{t'), Figs. [T]i-f show the color coded 

daily activity of three members in OCl]. For every member we obtain a record of a length 
of 63 days (OCl) or 492 days (0C2). We note that former studies reporting Eq. ([2]) such as 
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17, 



18 



19| typically were not based on data with temporal resolution as we use it here. 



and therefore were not able to investigate its origin in terms of temporal correlations. 

We quantify the temporal correlations in the members' activity by mapping the problem 
to a one-dimensional random walk. The quantity Y{t) = Ylt'=i (/^(^') ~ if^i^)))^ where 
(fJ'it)) is the average of the corresponding record fi{t), represents the position of the random 
walker that performs an up or down step given by fi{t') — (/i(t)) at time step t'. The 
correlations after At steps are reflected in the behavior of the root-mean-square displacement 
F{At) = ^J{[Y{t + At) -Y{t)f) y, where (■) is the average over t and members. If the 
activity /i(t) is uncorrelated or short-term correlated^ then one obtains F(At) ~ (AtY^'^, 
Fick's law of diffusion, after some cross-over time. In the case of long-term correlations, the 
result is a power-law increase 

F{At) ~ (Atf , (3) 

I — I 

where H > 1/2 is the fluctuation exponent (also known as Hurst exponent [20]). In statistical 
physics, long-term correlation or persistence is also referred to as long-term "memory". 
Since, in general, the records might be affected by trends, we use the standard Detrended 
Fluctuation Analysis (DFA) 2l|] to calculate H (see SI Sec. Ill for a detailed description). 

The results for OCl are shown in Figs. [3K,b, where we calculate Eq. ([3]) by separating 
the members in groups with different total number of messages sent by the members, M. 
We find that F{At) asymptotically follows a power-law with H ^ 1/2 for the less active 
members who sent less than 10 messages in the entire period (M < 10). The dynamics of 
the more active members display clear long-term correlations. We find that the fluctuation 
exponent increases to H ^ 0.75 for members with M > 10^ (see Fig. ^jp). The smaller value 
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of H for less active members could be due to the small amount of information that these 
members provide in the available time of data acquisition. When we shuffle the data to 
remove any temporal correlations, we obtain the random exponent -ffrnd = 1/2 (as seen in 
Fig. Eb), confirming that the correlations in the data are due to temporal structure. 

The dynamics of the message activity in 0C2 is similar to OCl (see Fig. On large 
time scales we measure the fluctuation exponent increasing from H ^ 1/2 to// ~ 0.9 
with increasing M (the exponents for very active members are based on poor statistics and 
therefore carry large error bars). Analogous to the results obtained for OCl, there are no 
correlations in the shuffled records (i/rnd = 1/2 in Fig- Eli). The fact that H > 1/2 means 
that a sudden burst in activity of a member persists on times scales ranging from days to 
years. The distribution of activity is self-similar over time. Similar correlation results have 
been found in traded values of stocks and email data |22|. 



Relation between /? and H 

Next, we elaborate the mathematical framework that relates the growth process Eq. ([2]) to 
the long-term correlations, Eq. ([3]). To relate the exponent from Eq. ([2]), /3, to the temporal 
correlation exponent 7, from Eq. (jl]), and therefore to H, one can first rewrite Eq. ([1]) as: 

mi , mo + Am 

r = m — = m with Am = mi — mo 

mo mo 

, Am \ Am Am 

In h 1 ~ for small . 



mo / mo mo 

Next, the total increment of messages Am is expressed in terms of smaller increments fi{t), 
such as messages per day: 

to+At 

Am= fi{t) , 

t=to+l 

which is (assuming stationarity) statistically equivalent to Am = Ylt^i /^(^) 5 can 
write r ^ XltL*i /^(^) ^^r the growth rate. The conditional average growth is then 

At ^ At 
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Then, the conditional standard deviation a{mo) = a/ ([r(mo) — {r{mo))]'^) , can be written 
in terms of the auto-correlation function as follows: 



r(mo) — (r 
[r(mo) - (r(mo))] 



/ Ai At \ 

("^o)) = — EM^)-E(Mt)) 
° \i=i t=i / 



^ At At 

([r(mo) - (r(mo))f ) ^ E E ^JC'O' 



where C{At) = -^{[^{t) — (/u(t))] + At) — is the auto-correlation function of 

/i(t) and cr^ is the standard deviation of fi{t). The auto-correlation function C{At) measures 
the interdependencies between the values of the record fi{t). For uncorrelated values, C{At) 
is zero for At > 0, because on average positive and negative products of the record will 
cancel out each other. In the case of short-term correlations, C{At) has a characteristic 
decay time, At^- A prominent example is the exponential decay C{At) ~ exp(— At/Atx)- 
Long-term correlations are described by a slower decay namely a power-law, 

C{At)^{At)-\ (4) 

with the correlation exponent < 7 < 1 which is related to the fluctuation exponent H from 
Eq. (IH]) by 7 = 2 — 2H [20]. We note that 7 = 1 (or 7 > 1) corresponds to an uncorrelated 
record with H = 1/2. A key- property of long-term correlations is a pronounced mountain- 
valley structure in the records [20|]. Statistically, large values of fi{t) are likely to be followed 
by large values and small values by small ones. Ideally, this holds on all time scales, which 
means a sequence in daily, weekly or monthly resolution is correlated in the same way as 
the original sequence. 

Assuming long-term correlations asymptotically decaying as in Eq. (jlj), we approximate 
the double sum with integrals and obtain: 

([r(mo) - (r(mo))]') ^ / - z)-^djdz ~ i^aj {Atf-^ . 

f^Q J Jl 

In order to relate At and mo, one can use At = xtg, where x is an arbitrary (small) 
constant, that simply states how large At is compared to to, and mo ~ to , which states that 
the number of messages is proportional to time assuming stationary activity. Using these 
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two arguments we obtain: 



1 

1 

7/2 



([r(mo)-(r(mo))]^) ^ -f^J (a:)^"^ (to)'"^ ~ a^"", 



mo 



cr(mo) ~ cr^mg 

Comparing with Eq. (I2l), we finally obtain /? = 7/2 , and with 7 = 2 — 2iJ: 

(5 = l-H. (5) 

Equation ([5]) is a scaling law formalizing the relation between growth and long-term 
correlations in the activity and is confirmed by our data. For OCl we measured /3oci ~ 0.22 
yielding -ffoci ~ 0.78 from Eq. (I5l), which is in approximate agreement with the (maximum) 
exponent we obtained by direct measurements for OCl {H = 0.75 ± 0.05 from Fig. [3b). For 
0C2 we obtained Poc2 ~ 0.17 and therefore -^002 ~ 0.83 through Eq. ([5]) which is not too 
far from the (maximum) exponent found by direct measurements for 0C2 (H = 0.88 ±0.03). 
According to Eq. ([5]), the original Gibrat's law {(3q = 0) corresponds to very strong long- 
term correlations with Hq = 1. This is the case when the activity on all time scales exhibits 
equally strong correlations. In contrast, Prnd = 1/2 represents completely random activity 
(i^rnd = 1/2), as obtained for the randomized data in Fig. [3]D,d. 

The mathematical framework relating long-term correlations quantified by H and the 
growth fluctuations quantified by f3 could be relevant to other complex systems. While the 
generalized version of Gibrat's law has been reported for economic indicators displaying 

nnn 

/3 ~ 0.2 1 161 . 1171 . 118I |. the origin of this scaling law is not clear and still being investigated. 
Our results suggest that the value of P could be explained by the existence of long-term 
correlations in the activity of the corresponding system ranging from firms and markets 
to social and population dynamics. In turn, Eq. (151) establishes a missing link between 

nnn 

studies of growth processes in economic or social svstem s |16l . 1171 . 118I | and studies of long- 



term correlations such as in finance and the economy [23'], Ethernet traffic [24], human 



brain |25| or motor activity [26]. Our results foreshadow that systems involving other types 
of human interactions such as various Internet activities, communication via cell phones, 
trading activity, etc. may display similar growth and correlation properties as found here, 
offering the possibility of explaining their dynamics in terms of the long-term persistence of 
the individuals' behavior. 
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Growth of the degree in the underlying social network 



Communication among the members of a community represents a type of a social inter- 
action that defines a network, whereas a message is sent either based on an existing relation 
between two members or establishing a new one. There is considerable interest in the origin 
of broad distributions of activity in social systems. Two paradigms have been invoked for 



various applications in social systems: the "rich-get-richer" idea used by Simon in 1955 



27| 



and the models based on optimization strategies as proposed by Mandelbrot [28j. Regard- 
ing network models, the preferential attachment (PA) model has been introduced 29|] to 
generate a type of stochastic scale-free networks with a power-law degree distribution in the 
network topology. Considering the social network of members linked when they exchange at 
least one message (that has not been sent before), we examine the dynamic of the number 
of outgoing links of each member [the out-degree k(t)] in analogy to Eqs. ([2]). 

We start from the empty set of nodes consisting of all the members in the community 
and chronologically add a directed link between two members when a messages is sent. In 
analogy to the growth in the number of messages m{t) of each member, we study the growth 
of the members' out-degree k{t), i.e. the number of links to others. We define the growth 
rate of every member as 

^fe = In ^ , (6) 

where fco = ^(^o) is the out-degree of a member at time to and ki = k(ti) is the out-degree 
at time ti. Again, there is a growth rate for each member j, but for a better readability, 
we skip the index. In Fig. IHwe study {rk{ko)), the average growth rate conditional to the 
initial out-degree ko, and crkiko), the standard deviation of the growth rate conditional to the 
initial out-degree ko for OCl and 0C2. We obtain almost constant average growth {rk{ko)) 
as a function of ko as in the study of messages. 

The conditional standard deviation of the network-degree, ak{ko), is shown in Fig. IHfor 
both social communities. We obtain a power-law relation analogous to Eq. ([2]): 

cTkiko) ~ fco"^' , (7) 

with fluctuation exponents very similar to those found for the number of messages, namely 
/3fc OCl = 0.22 ± 0.02 for OCl and (3k,oc2 = 0.17 ± 0.08 for 0C2. This values are consistent 
with those we obtained for the activity of sending messages. 



9 



Next, we consider the preferential attachment model which has been introduced to gener- 



ate scale-free networks |29j with power-law degree distribution P{k) of the type investigated 
in the present study. Essentially, it consists of subsequently adding nodes to the network by 
linking them to existing nodes which are chosen randomly with a probability proportional 
to their degree. We consider the undirected network and study the degree growth properties 
using Eqs. and ([7j) and calculate the conditional average growth rate (rpA(A;o)) and the 
conditional standard deviation o"pA(fco)- The times to and ti are defined by the number of 
nodes attached to the network. Figure 2 in the SI Sec. IV shows the results where an average 
degree (k) = 20; 50, 000 nodes in to, and 100, 000 nodes in ti were chosen. We find constant 
average growth rate that does not depend on the initial degree kg. The conditional standard 
deviation is a function of k^ and exhibits a power-law decay characterized by Eq. ([7]), re- 
spectively Eq. ([2]), with /5pa = 1/2. The value /5pa = 1/2 in Eq. ([5]) corresponds to H = 1/2 
indicating complete randomness. There is no memory in the system. Since each addition 
of a new node is completely independent from precedent ones, there cannot be temporal 
correlations in the activity of adding links. Therefore, purely preferential attachment type 
of growth is not sufficient to describe the social network dynamics found in the present study 
and further temporal correlations have to be incorporated according to Eq. ([3]). 

For the PA model it has been shown that the degree of each node grows in time as 
k(t) ~ (^)^5 where t* is the time when the corresponding node was introduced to the 



system and 6 = 1/2 is the dynamics exponent in growing network models 30|]. Accordingly, 
the growth rate is given by rpA = &ln 7^, which is constant independent of /cq, in accordance 
with our numerical findings. Furthermore, in SI Sec. IV we obtain analytically the exponent 
/3pA = 1/2 confirming the numerical results, as well. Interestingly, an extension of the 



standard PA model has been proposed 3l| that takes into account different fitnesses of the 
nodes to acquiring links involving a distribution of 6-exponents and therefore a distribution 
of growth rates. This model opens the possibility to relate the distribution of fitness values 
to the fiuctuations in the growth rates, a point that requires further investigation. 



III. DISCUSSION 

From a statistical physics point of view, the finding of long-term correlations opens the 
question of the origin of such a persistence pattern in the communication. At this point we 
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speculate on two possible scenarios, which require further studies. The question is whet 



ner 



32 



33| 



the finding of an exponent > 0.5 is due to a power-law (Levy type) distribution 
in the time interval between two messages of the same person or just from pure correlations 
or long-term memory in the activity of people. In the first scenario, the intervals between 



the messages follow a power-law 13|, |3J] . Accordingly, the activity pattern comprises many 



short intervals and few long ones, implying persistent epochs of small and large activity. 
This fractal-like activity leads to long-term correlations with H > 1/2 (see the analogous 
problem of the origin of long-term correlations in DNA sequences as discussed in 33(|). This 
scenario implies a direct link between the correlations and the distribution of inter-event 
intervals which can be obtained analytically. In the second scenario, the intervals between 
the messages do not follow a Levy type distribution, but the value of the time intervals 
are not independent of each other, again representing long-term persistence. For example, 
the distribution of inter-event times could be stretched exponential (see recent work on the 
study of extreme events of climatological records exhibiting long-term correlations jssjl). 
Thus, deciding between these two possible scenarios for the origin of correlations in activity 
requires an extended analysis of inter-event intervals as well as correlations to determine 
whether the behavior is Levy-like or pure memory like. A careful statistical analysis is 
needed which will be the focus of future research. 

To some extent, the human nature of persistent interactions enables the prediction of 
the actants' activity. Our finding implies that traditional mean-field approximations based 
on the assumption that the particular type of human activity under study can be treated 
as a large number of independent random events (Poisson statistics) may result in faulty 
predictions. On the contrary, from the growth properties found here, one can estimate 
the probability for members of certain activity level to send more than a given number of 
messages in the future. This result may help to improve the proper allocation of resources 
in communication-based systems ranging from economic markets to political systems. As a 
byproduct, our finding that the activity of sending messages exhibits long-term persistence 
suggests the existence of an underlying long-term correlated process. This can be understood 
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as an unknown individual state driven by various internal and external stimuli 
providing the probability to send messages. In addition, the memory in activity found 
here could be the origin of the long-term persistence found in other records representing a 
superposition of the individuals' behavior, such as the Ethernet traffic 2J], highway traffic, 
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stock markets, and so forth. 



IV. MATERIALS AND METHODS 

Calculations of (r(mo)), cr{mo) and optimal times to and ti 

The average growtli rate, (r(mo)), and tlie standard deviation, cr(mo) = 
^y (r(mo)^) — (r(mo))^, are defined as follows. Calling P{r\mo) the conditional probabil- 
ity density of finding a member with growth rate r(mo) with the condition of initial number 
of messages tuq, then we obtain: 

(r(mo)) = JrP(r\mo) dr , (8) 

and 

(r(mo)^) = Jr'^P{r\'mo) dr . (9) 

In order to calculate the growth rate Eq. ([T]), one has to choose the times to and ti in the 
period of data acquisition T. Naturally, it is best to use all data in order to have optimal 
statistics. Accordingly, ti is chosen best at the end of the available data (ti = T). We argue 
that if the choice of to is too small, then m(to) is zero for many members (those that send 
their messages later), which are then rejected in the calculation because of the division in 
Eq. ([T]). Conversely, if to is chosen too large, then there is not enough time to observe the 
member's activity and r = will occur frequently, indicating no change (members have sent 
their messages before). Thus, there must be an optimal time in between. In SI Sec. II, 
Fig. 1, we plot, as a function of to, the number of members with at least one message at to 
[mo > 0] and further exhibit at least some activity until ti = T [mi — mo > 0]. For both 
online communities we find an optimal to in the middle of the period of observation to = T/2, 
a value that is used for the analysis in the main text. 

Shuffling of the message data 

The raw data comprises one entry for each message consisting of the time when the 
message is sent, the sender identifier and the receiver identifier. For example: 
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time sender receiver 



1 



a 



b 



2 



a 



c 



4 



b 



a 



6 



c 



d 



7 



a 



b 



This means, at t = 1 member a sends a message to member b, at t = 2 member a sends a 
message to member c, and so on. 

The randomized surrogate data set is created by randomly swapping the instants (time) 
at which the messages are sent between two events chosen at random. Thus, each message 
entry randomly obtains the time of another one. This means the total number of messages 
is preserved and the associations between them get shuffled. Temporal correlations are 
destroyed, but the set of instants at which the messages arc sent remains unchanged. For 
instance, swapping events at t = 1 and ^ = 6 results in: t = 1, c ^ d, and t — 6, a^ b. 
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FIGURE 1: 



FIG. 1: A typical example of an individuals' message activity, a, Instants at which 
messages were sent by a member belonging to OCl. b, Cumulative number of messages m(t) 
(green) and the same but with the messages placed at random (brown), c, Sequence of number of 
messages sent per day, fj,{t), for the same individual. d,e,f, Color coded sequences fi{t) for members 
sending M = 100; 1,000; or 10,000 messages overall, respectively. The color is proportional to the 
logarithm of the number of messages per day (red: 1 message, blue: 400 messages, white for no 
message). 
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FIGURE 2: 

FIG. 2: Average and standard deviation of the growth rate versus number of messages. 

a, Results for OCl. The average growth rate of messages conditional to mo is almost constant and 
the standard deviation decays with an exponent /?oci = 0-22 it 0.01. b, Results for 0C2. The 
standard deviation conditional to mo decays with an exponent /?oC2 = 0.17 ± 0.03. c, Results 
for OCl, when the messages are shuffled, displaying /^md = 1/2- d, Results for 0C2, when the 
messages are shuffled. In all cases to corresponds to half of the period of data acquisition and ti 
to the end, which we found to provide optimal statistics (see SI Fig. 1). 
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FIGURE 3: 

FIG. 3: Long-term correlations in the message activity of OCl (a and b) and OC2 (c 

and d). a, DFA fluctuation functions averaged conditional to M, the total number of messages 
sent by each member (black: 1-2, red: 3-7, green: 8-20, blue: 21-54, orange: 55-148, brown: 149- 
403, maroon: 404-1096, violet: 1097-2980, turquoise: 2981-8103). The dotted lines serve as guides, 
the one in the bottom corresponds to the uncorrelated case, while the one in the top corresponds 
to the exponent 0.75. b. Fluctuation exponent H measured from panel a on the scales 10 days < 
At < 63 days as a function of the total number of messages sent, M, for real (blue) and individually 
shuffled (green) records, c, DFA fluctuation functions averaged conditional to M [colors as in (A)]. 
The dotted lines correspond to the uncorrelated case (bottom) and to the exponent 1 (top), d. 
Fluctuation exponents obtained from panel c on the scales 32 days < At < 200 days as a function 
of the total number of messages sent, M. Due to weak statistics causing large error bars we do not 
consider the last two values for M > 500 as reliable. For clarity the fluctuation functions in panels 
a and c are shifted vertically. 




FIGURE 4: 

FIG. 4: Mean out-degree growth rate and standard deviation versus initial out-degree. 

a, Results for OCl. The average growth of out-degree conditional to the out-degree at to is almost 
constant. The standard deviation decays with an exponent /?fc,oci = 0-22 =b 0.02. b, Results 
for 0C2. The standard deviation conditional to the out-degree at to decays with an exponent 
/5fc,OC2 = 0.17 ± 0.08. The quantities are analogous to those of Fig. [2] except that here the growth 
rate of the out-degree is considered instead of the number of messages sent. 
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SUPPORTING INFORMATION (SI) 



Scaling laws of human interaction activity 

Diego Rybski, Sergey V. Buldyrev, Shlomo Havlin, 
Predrik Liljeros, and Hernan A. Makse 



I. NOTATION 

1. Member j sends his/her nth message at time tj{ri), where 1 < n < Mj and Mj is the 
total number of messages sent by j in the time of data acquisition T. The sequence 
of counts defined as the number of messages in the period 5t, is given by 

i^fit)- E (10) 

n,tj{n)elt,t+5t] 

where aj{n) = 1. In addition, the periods are non-overlapping, t = iSt with integer i, 
and therefore 1 < tj{n) < T. In the case of daily resolution 5t — 1 day 

2. The cumulative number of messages that a member sends until time t is: 



<w = E<(^')- (11) 

t'=i 

In particular, mj{l) — and mj{T) — Mj. 

3. The displacement of the random walk is the cumulative sum of the normalized iif{t): 

Yt{t) = j^{4{t')-{^fm, (12) 



t'=l 



where (//^*(i)) is the average of //|*(t) in time t. The root- mean-square displacement 
after At is defined as 

F^'m = ^{[Yt{t + ^t)-Y^\t)f)^ , (13) 

where the average is performed over the time t. Additionally, we perform an average 
over members j with activity level M and define 

{F'\i^t))l^{{F^'f\M)j. (14) 
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FIG. 5: Optimal times to and ti. The panels show for a, OCl, and b, 0C2, the number of 
members with both, mo > and mi — mo > 0. While ti obviously is optimal at the end of the 
period, to is varied to find the value for which the number of members - with at least one message 
until to and at least one new message between to and ti - is maximal. 

4. For simplicity, in the main text we skip the index j as well as 6t and write /i(t), fn{t), 
Y{t), as well as F{At). 

5. To investigate the growth in the number of messages we use the quantities r = In 
(r(mo)), a(mo) and the exponents /3oci, Poc2, Pg, Pmd- 

6. To investigate the growth of the degree we use the quantities = ln|^, {rkiko}), 
cTkiko) and the exponents (3k,oci; A,oc2- 

7. For the growth of the degree in the preferential attachment model we use the quantities 
rpA = ln|^, (rpA(A;o)), o-pa(/co) and the exponent Ppa- 

II. OPTIMAL TIMES to AND ti 

Figure [5] displays the optimal times to and ti to calculate the growth rates for OCl 
(panel a) and 0C2 (panel b). 
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III. DETAILS ON THE QUANTIFICATION OF LONG-TERM CORRELATIONS 
USING DETRENDED FLUCTUATION ANALYSIS 

Statistical dependencies between the values of a record /i(t) with t = 1, . . . ,T can be 
characterized by the auto-correlation function 

^ T-At 

where T is the length of the record fi{t), {fi{t)) its average, and its standard deviation. 
For uncorrelated values of C{/S.t) is zero for At > 0, because on average positive 

and negative products will cancel each other out. In the case of short-term correlations 
C{At) has a characteristic decay time Atx- A prominent example is the exponential decay 
C{At) ~ exp(— At/Atx)- Long-term correlations are described by a slower decay, e.g. 
diverging Atx, namely a power- law, 

C{At) ~ (At)-^ , (16) 

with the correlation exponent < 7 < 1. 

Detrended Fluctuation Analysis (DFA) is a well studied method to quantify long-term 
correlations in the presence of non-stationarities [2l|. The analysis of a considered record 
of length T consists of [5] steps: 

1. Calculate the cumulative sum, the so-called profile: 

m = E(MO-(Mt))) • (17) 

t'=l 

2. Separate the profile Y{t) into Ta* = int-^ segments of length At. Often, the length 
of the record is not a multiple of At. In order not to disregard information, the 
segmentation procedure is repeated starting from the end of the record and one obtains 
2T/\t segments. 

3. Locally detrend each segment u by determining best polynomial fits plJ^\t) of order n 
and subsequently subtract it from the profile: 

YA,{t) = Y{t)-p^J^\t). (18) 
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4. Calculate for each segment the variance (squared residuals) of the detrended YAt(t) 

At 

Fl{i.) = —J2{Yl,[{u-l)At + j]) (19) 

by averaging over all values in the corresponding uth segment. 

5. The DFA fluctuation function is given by the square- root of the average over all seg- 
ments: 



F(At) 



2Ta 

^2 



2TAt , 



1/2 



(20) 



The averaging of F\^{v) is additionally performed over members of similar activity 
level M. 

If the record nit) is long-term correlated according to a power-law decaying auto- 
correlation function, Eq. f[T6|) . then F[At) increases for large scales At also as a power-law: 

F(At) ~ (At)^ , (21) 

where the fluctuation exponent H is analogous to the well-known Hurst exponent 

0- The 

exponents are related via 

i7 = 1-7/2, 7 = 2-2/7. (22) 

When 7 = 1 then ifrnd = 1/2? that is the case of uncorrelated dynamics. If the correlations 
decay faster than 7 > 1 then the random exponent i^rnd = 1/2 is still recovered. Long-term 
correlations imply < 7 < 1 and 1/2 < if < 1. In practice, one plots F{At) versus At in 
double-logarithmic representation, determines the exponent H on large scales and quantifies 
the correlation exponent 7. The order of the polynomials pi"^ determines the detrending 
technique which is named DFAn, DFAO for constant detrend, DFAl for linear, DFA2 for 
parabolic, etc. 

The subtraction of the average in Eq. f|T7j) is only necessary for DFAO. By definition 
the corresponding fluctuation function is only given for At > n -|- 2. The detrending order 
determines the capability of detrending. Since the local trends are subtracted from the 
profile, only trends of order n — 1 are subtracted from the original record /i(t). Throughout 
the paper we show the results using DFA2 which we found to be sufficient in terms of 
detrending. 
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FIG. 6: Growth properties of the preferential attachment model [29,] discussed in the main text. 
We plot the average (black circles) and standard deviation (blue squares) of the growth rate rpA 
conditional to kg, the degree of the corresponding nodes at the first stage. 

Since the fluctuation functions F{At) for single users are very noisy, it is useful to average 
fluctuation functions among various members. Thus, we first group the members in loga- 
rithmic bins according to their activity level, the total number of messages M sent. Namely, 
we group all members that send 1-2, 3-7, 8-20, . . . messages in the period of data acquisition 
by using bins determined by 6 = int (In M) . Next we average the fluctuation function among 
all members from each group b and obtain for every activity level of the members one DFA 
fluctuation function. The error bars in Fig. 3a,c of the main text were obtained by subdivid- 
ing each group and determining the standard deviations of the fluctuation exponents from 
different groups of the same activity level. 



IV. GROWTH IN THE DEGREE 

Figure [6] shows the results of the average growth rates and fluctuations of the growth 
rates as a function of the initial degree for the preferential attachment model 2^. We find 
a constant average growth rate and a standard deviation decreasing as a power law with 
exponent /5pA = 1/2 in Eq. (7) in the main text. 

The PA network model has been described analytically. In particular, it has been shown 
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that each nodes' degree increases as 



Ht)-{-] , (23) 



where t* is the time when the corresponding node was introduced to the system and b is 



the dynamics exponent in growing network models {b = 1/2 for the standard PA) 30|. 
Accordingly, here the growth rate, Eq. (6) in the main text, is rpA = which we also 

find in Fig. [6l 

To obtain crpA(^o) one can use analogous considerations as for a{mo) in the main text. 
Due to Eq. (6) in the main text, here we have 

1 ^* 

° t=i 

where K{t) are small increments analogous to /i(t), whereas Eq. implies 

K{t) ~ (At)-i/2 ^ (25) 

As before, the conditional standard deviation of the growth rate is 

^ At At 

([rpA(fco) - (rPA(fco))]') ^ p E E ^'C'O- - ^) • (26) 
In the uncorrelated case C{j — i) = Sij, the double sum can be reduced to a single one: 



cr 



2 

PA 



At 

(^o)= pE^'(^)- (27) 



l(At)V2. (29) 



As shown below, af^^i) ~ i and integration leads to 

4a(^o) ~ ^ / r^Mz (28) 
1 

Eliminating At using k ~ t^^/^, Eq. (l23l) . one obtains 

apA(A:o) ~ A^o"'/' . (30) 

That is, we obtain /?pa = 1/2 as found numerically. 

Remains to show (Jf^(t) ~ We assume new links are set according to a Poisson 

process, whereas every new link of a node represents an event. The intervals between these 
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events (asymptotically) follow an exponential distribution p{t) = Ae"'^'^. Accordingly, K{t) 
is a sequence of zeros and only one when a new link is set to the corresponding node. The 
standard deviation of this sequence is 

a. ~ . (31) 
Due to Eq. (123|) the rate parameter decreases like 

A(t) ~ t"^/' . (32) 

Accordingly, 

a,{t)r.t-'/\ (33) 



In order to extend the standard PA model, a fitness model has been introduced [31 1 
taking into account different fitnesses of the nodes of acquiring links and therefore involving a 
distribution of 6-exponents. The spread of growth rates r could be related to the distribution 
of fitness. On the other hand, the growth according to Eq. fl23|) is superimposed with random 
fluctuations that we characterize with the exponent (3. 
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